Overview

Dataset statistics

Number of variables32
Number of observations119390
Missing cells129425
Missing cells (%)3.4%
Duplicate rows31994
Duplicate rows (%)26.8%
Total size in memory29.1 MiB
Average record size in memory256.0 B

Variable types

NUM17
CAT13
BOOL2

Warnings

Dataset has 31994 (26.8%) duplicate rows Duplicates
country has a high cardinality: 177 distinct values High cardinality
reservation_status_date has a high cardinality: 926 distinct values High cardinality
agent has 16340 (13.7%) missing values Missing
company has 112593 (94.3%) missing values Missing
babies is highly skewed (γ1 = 24.64654483) Skewed
previous_cancellations is highly skewed (γ1 = 24.45804872) Skewed
previous_bookings_not_canceled is highly skewed (γ1 = 23.53979995) Skewed
lead_time has 6345 (5.3%) zeros Zeros
stays_in_weekend_nights has 51998 (43.6%) zeros Zeros
stays_in_week_nights has 7645 (6.4%) zeros Zeros
children has 110796 (92.8%) zeros Zeros
babies has 118473 (99.2%) zeros Zeros
previous_cancellations has 112906 (94.6%) zeros Zeros
previous_bookings_not_canceled has 115770 (97.0%) zeros Zeros
booking_changes has 101314 (84.9%) zeros Zeros
days_in_waiting_list has 115692 (96.9%) zeros Zeros
adr has 1959 (1.6%) zeros Zeros
required_car_parking_spaces has 111974 (93.8%) zeros Zeros
total_of_special_requests has 70318 (58.9%) zeros Zeros

Reproduction

Analysis started2020-11-23 18:43:16.503022
Analysis finished2020-11-23 18:44:26.424532
Duration1 minute and 9.92 seconds
Software versionpandas-profiling v2.9.0
Download configurationconfig.yaml

Variables

hotel
Categorical

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size932.7 KiB
City Hotel
79330 
Resort Hotel
40060 
ValueCountFrequency (%) 
City Hotel7933066.4%
 
Resort Hotel4006033.6%
 
Frequencies of value counts

Unique

Unique0 ?
Unique (%)0.0%
Histogram of lengths of the category

Length

Max length12
Median length10
Mean length10.67107798
Min length10
Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size932.7 KiB
0
75166 
1
44224 
ValueCountFrequency (%) 
07516663.0%
 
14422437.0%
 

lead_time
Real number (ℝ≥0)

ZEROS

Distinct479
Distinct (%)0.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean104.0114164
Minimum0
Maximum737
Zeros6345
Zeros (%)5.3%
Memory size932.7 KiB

Quantile statistics

Minimum0
5-th percentile0
Q118
median69
Q3160
95-th percentile320
Maximum737
Range737
Interquartile range (IQR)142

Descriptive statistics

Standard deviation106.863097
Coefficient of variation (CV)1.027416997
Kurtosis1.696448849
Mean104.0114164
Median Absolute Deviation (MAD)60
Skewness1.346549873
Sum12417923
Variance11419.72151
MonotocityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
063455.3%
 
134602.9%
 
220691.7%
 
318161.5%
 
417151.4%
 
515651.3%
 
614451.2%
 
713311.1%
 
811381.0%
 
1210790.9%
 
Other values (469)9742781.6%
 
ValueCountFrequency (%) 
063455.3%
 
134602.9%
 
220691.7%
 
318161.5%
 
417151.4%
 
ValueCountFrequency (%) 
7371< 0.1%
 
7091< 0.1%
 
62917< 0.1%
 
62630< 0.1%
 
62217< 0.1%
 
Distinct3
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size932.7 KiB
2016
56707 
2017
40687 
2015
21996 
ValueCountFrequency (%) 
20165670747.5%
 
20174068734.1%
 
20152199618.4%
 
Frequencies of value counts

Unique

Unique0 ?
Unique (%)0.0%
Histogram of lengths of the category

Length

Max length4
Median length4
Mean length4
Min length4
Distinct12
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size932.7 KiB
August
13877 
July
12661 
May
11791 
October
11160 
April
11089 
Other values (7)
58812 
ValueCountFrequency (%) 
August1387711.6%
 
July1266110.6%
 
May117919.9%
 
October111609.3%
 
April110899.3%
 
June109399.2%
 
September105088.8%
 
March97948.2%
 
February80686.8%
 
November67945.7%
 
Other values (2)1270910.6%
 
Frequencies of value counts

Unique

Unique0 ?
Unique (%)0.0%
Histogram of lengths of the category

Length

Max length9
Median length6
Mean length5.903182846
Min length3

arrival_date_week_number
Real number (ℝ≥0)

Distinct53
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean27.16517296
Minimum1
Maximum53
Zeros0
Zeros (%)0.0%
Memory size932.7 KiB

Quantile statistics

Minimum1
5-th percentile5
Q116
median28
Q338
95-th percentile49
Maximum53
Range52
Interquartile range (IQR)22

Descriptive statistics

Standard deviation13.60513836
Coefficient of variation (CV)0.500830176
Kurtosis-0.9860771763
Mean27.16517296
Median Absolute Deviation (MAD)11
Skewness-0.01001432604
Sum3243250
Variance185.0997897
MonotocityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
3335803.0%
 
3030872.6%
 
3230452.6%
 
3430402.5%
 
1829262.5%
 
2128542.4%
 
2828532.4%
 
1728052.3%
 
2027852.3%
 
2927632.3%
 
Other values (43)8965275.1%
 
ValueCountFrequency (%) 
110470.9%
 
212181.0%
 
313191.1%
 
414871.2%
 
513871.2%
 
ValueCountFrequency (%) 
5318161.5%
 
5211951.0%
 
519330.8%
 
5015051.3%
 
4917821.5%
 

arrival_date_day_of_month
Real number (ℝ≥0)

Distinct31
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean15.79824106
Minimum1
Maximum31
Zeros0
Zeros (%)0.0%
Memory size932.7 KiB

Quantile statistics

Minimum1
5-th percentile2
Q18
median16
Q323
95-th percentile30
Maximum31
Range30
Interquartile range (IQR)15

Descriptive statistics

Standard deviation8.780829471
Coefficient of variation (CV)0.5558105765
Kurtosis-1.187168319
Mean15.79824106
Median Absolute Deviation (MAD)8
Skewness-0.002000453979
Sum1886152
Variance77.10296619
MonotocityNot monotonic
Histogram with fixed size bins (bins=31)
ValueCountFrequency (%) 
1744063.7%
 
543173.6%
 
1541963.5%
 
2541603.5%
 
2641473.5%
 
940963.4%
 
1240873.4%
 
1640783.4%
 
240553.4%
 
1940523.4%
 
Other values (21)7779665.2%
 
ValueCountFrequency (%) 
136263.0%
 
240553.4%
 
338553.2%
 
437633.2%
 
543173.6%
 
ValueCountFrequency (%) 
3122081.8%
 
3038533.2%
 
2935803.0%
 
2839463.3%
 
2738023.2%
 

stays_in_weekend_nights
Real number (ℝ≥0)

ZEROS

Distinct17
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.9275986264
Minimum0
Maximum19
Zeros51998
Zeros (%)43.6%
Memory size932.7 KiB

Quantile statistics

Minimum0
5-th percentile0
Q10
median1
Q32
95-th percentile2
Maximum19
Range19
Interquartile range (IQR)2

Descriptive statistics

Standard deviation0.9986134946
Coefficient of variation (CV)1.076557755
Kurtosis7.174066064
Mean0.9275986264
Median Absolute Deviation (MAD)1
Skewness1.38004645
Sum110746
Variance0.9972289116
MonotocityNot monotonic
Histogram with fixed size bins (bins=17)
ValueCountFrequency (%) 
05199843.6%
 
23330827.9%
 
13062625.7%
 
418551.6%
 
312591.1%
 
61530.1%
 
5790.1%
 
8600.1%
 
719< 0.1%
 
911< 0.1%
 
Other values (7)22< 0.1%
 
ValueCountFrequency (%) 
05199843.6%
 
13062625.7%
 
23330827.9%
 
312591.1%
 
418551.6%
 
ValueCountFrequency (%) 
191< 0.1%
 
181< 0.1%
 
163< 0.1%
 
142< 0.1%
 
133< 0.1%
 

stays_in_week_nights
Real number (ℝ≥0)

ZEROS

Distinct35
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2.500301533
Minimum0
Maximum50
Zeros7645
Zeros (%)6.4%
Memory size932.7 KiB

Quantile statistics

Minimum0
5-th percentile0
Q11
median2
Q33
95-th percentile5
Maximum50
Range50
Interquartile range (IQR)2

Descriptive statistics

Standard deviation1.908285615
Coefficient of variation (CV)0.7632221914
Kurtosis24.28455482
Mean2.500301533
Median Absolute Deviation (MAD)1
Skewness2.862249242
Sum298511
Variance3.641553989
MonotocityNot monotonic
Histogram with fixed size bins (bins=35)
ValueCountFrequency (%) 
23368428.2%
 
13031025.4%
 
32225818.6%
 
5110779.3%
 
495638.0%
 
076456.4%
 
614991.3%
 
1010360.9%
 
710290.9%
 
86560.5%
 
Other values (25)6330.5%
 
ValueCountFrequency (%) 
076456.4%
 
13031025.4%
 
23368428.2%
 
32225818.6%
 
495638.0%
 
ValueCountFrequency (%) 
501< 0.1%
 
421< 0.1%
 
411< 0.1%
 
402< 0.1%
 
351< 0.1%
 

adults
Real number (ℝ≥0)

Distinct14
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1.856403384
Minimum0
Maximum55
Zeros403
Zeros (%)0.3%
Memory size932.7 KiB

Quantile statistics

Minimum0
5-th percentile1
Q12
median2
Q32
95-th percentile3
Maximum55
Range55
Interquartile range (IQR)0

Descriptive statistics

Standard deviation0.5792609988
Coefficient of variation (CV)0.3120340137
Kurtosis1352.115116
Mean1.856403384
Median Absolute Deviation (MAD)0
Skewness18.31780476
Sum221636
Variance0.3355433048
MonotocityNot monotonic
Histogram with fixed size bins (bins=14)
ValueCountFrequency (%) 
28968075.1%
 
12302719.3%
 
362025.2%
 
04030.3%
 
4620.1%
 
265< 0.1%
 
272< 0.1%
 
202< 0.1%
 
52< 0.1%
 
551< 0.1%
 
Other values (4)4< 0.1%
 
ValueCountFrequency (%) 
04030.3%
 
12302719.3%
 
28968075.1%
 
362025.2%
 
4620.1%
 
ValueCountFrequency (%) 
551< 0.1%
 
501< 0.1%
 
401< 0.1%
 
272< 0.1%
 
265< 0.1%
 

children
Real number (ℝ≥0)

ZEROS

Distinct5
Distinct (%)< 0.1%
Missing4
Missing (%)< 0.1%
Infinite0
Infinite (%)0.0%
Mean0.1038899033
Minimum0
Maximum10
Zeros110796
Zeros (%)92.8%
Memory size932.7 KiB

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q30
95-th percentile1
Maximum10
Range10
Interquartile range (IQR)0

Descriptive statistics

Standard deviation0.3985614448
Coefficient of variation (CV)3.836382863
Kurtosis18.67369236
Mean0.1038899033
Median Absolute Deviation (MAD)0
Skewness4.112589542
Sum12403
Variance0.1588512253
MonotocityNot monotonic
Histogram with fixed size bins (bins=5)
ValueCountFrequency (%) 
011079692.8%
 
148614.1%
 
236523.1%
 
3760.1%
 
101< 0.1%
 
(Missing)4< 0.1%
 
ValueCountFrequency (%) 
011079692.8%
 
148614.1%
 
236523.1%
 
3760.1%
 
101< 0.1%
 
ValueCountFrequency (%) 
101< 0.1%
 
3760.1%
 
236523.1%
 
148614.1%
 
011079692.8%
 

babies
Real number (ℝ≥0)

SKEWED
ZEROS

Distinct5
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.007948739425
Minimum0
Maximum10
Zeros118473
Zeros (%)99.2%
Memory size932.7 KiB

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q30
95-th percentile0
Maximum10
Range10
Interquartile range (IQR)0

Descriptive statistics

Standard deviation0.0974361913
Coefficient of variation (CV)12.25806837
Kurtosis1633.948235
Mean0.007948739425
Median Absolute Deviation (MAD)0
Skewness24.64654483
Sum949
Variance0.009493811375
MonotocityNot monotonic
Histogram with fixed size bins (bins=5)
ValueCountFrequency (%) 
011847399.2%
 
19000.8%
 
215< 0.1%
 
101< 0.1%
 
91< 0.1%
 
ValueCountFrequency (%) 
011847399.2%
 
19000.8%
 
215< 0.1%
 
91< 0.1%
 
101< 0.1%
 
ValueCountFrequency (%) 
101< 0.1%
 
91< 0.1%
 
215< 0.1%
 
19000.8%
 
011847399.2%
 

meal
Categorical

Distinct5
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size932.7 KiB
BB
92310 
HB
14463 
SC
10650 
Undefined
 
1169
FB
 
798
ValueCountFrequency (%) 
BB9231077.3%
 
HB1446312.1%
 
SC106508.9%
 
Undefined11691.0%
 
FB7980.7%
 
Frequencies of value counts

Unique

Unique0 ?
Unique (%)0.0%
Histogram of lengths of the category

Length

Max length9
Median length2
Mean length2.068540079
Min length2

country
Categorical

HIGH CARDINALITY

Distinct177
Distinct (%)0.1%
Missing488
Missing (%)0.4%
Memory size932.7 KiB
PRT
48590 
GBR
12129 
FRA
10415 
ESP
8568 
DEU
7287 
Other values (172)
31913 
ValueCountFrequency (%) 
PRT4859040.7%
 
GBR1212910.2%
 
FRA104158.7%
 
ESP85687.2%
 
DEU72876.1%
 
ITA37663.2%
 
IRL33752.8%
 
BEL23422.0%
 
BRA22241.9%
 
NLD21041.8%
 
Other values (167)1810215.2%
 
Frequencies of value counts

Unique

Unique30 ?
Unique (%)< 0.1%
Histogram of lengths of the category

Length

Max length3
Median length3
Mean length2.98928721
Min length2

market_segment
Categorical

Distinct8
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size932.7 KiB
Online TA
56477 
Offline TA/TO
24219 
Groups
19811 
Direct
12606 
Corporate
 
5295
Other values (3)
 
982
ValueCountFrequency (%) 
Online TA5647747.3%
 
Offline TA/TO2421920.3%
 
Groups1981116.6%
 
Direct1260610.6%
 
Corporate52954.4%
 
Complementary7430.6%
 
Aviation2370.2%
 
Undefined2< 0.1%
 
Frequencies of value counts

Unique

Unique0 ?
Unique (%)0.0%
Histogram of lengths of the category

Length

Max length13
Median length9
Mean length9.01976715
Min length6
Distinct5
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size932.7 KiB
TA/TO
97870 
Direct
14645 
Corporate
 
6677
GDS
 
193
Undefined
 
5
ValueCountFrequency (%) 
TA/TO9787082.0%
 
Direct1464512.3%
 
Corporate66775.6%
 
GDS1930.2%
 
Undefined5< 0.1%
 
Frequencies of value counts

Unique

Unique0 ?
Unique (%)0.0%
Histogram of lengths of the category

Length

Max length9
Median length5
Mean length5.343303459
Min length3
Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size932.7 KiB
0
115580 
1
 
3810
ValueCountFrequency (%) 
011558096.8%
 
138103.2%
 

previous_cancellations
Real number (ℝ≥0)

SKEWED
ZEROS

Distinct15
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.08711784907
Minimum0
Maximum26
Zeros112906
Zeros (%)94.6%
Memory size932.7 KiB

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q30
95-th percentile1
Maximum26
Range26
Interquartile range (IQR)0

Descriptive statistics

Standard deviation0.8443363842
Coefficient of variation (CV)9.691887405
Kurtosis674.0736926
Mean0.08711784907
Median Absolute Deviation (MAD)0
Skewness24.45804872
Sum10401
Variance0.7129039296
MonotocityNot monotonic
Histogram with fixed size bins (bins=15)
ValueCountFrequency (%) 
011290694.6%
 
160515.1%
 
21160.1%
 
3650.1%
 
2448< 0.1%
 
1135< 0.1%
 
431< 0.1%
 
2626< 0.1%
 
2525< 0.1%
 
622< 0.1%
 
Other values (5)650.1%
 
ValueCountFrequency (%) 
011290694.6%
 
160515.1%
 
21160.1%
 
3650.1%
 
431< 0.1%
 
ValueCountFrequency (%) 
2626< 0.1%
 
2525< 0.1%
 
2448< 0.1%
 
211< 0.1%
 
1919< 0.1%
 

previous_bookings_not_canceled
Real number (ℝ≥0)

SKEWED
ZEROS

Distinct73
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.1370969093
Minimum0
Maximum72
Zeros115770
Zeros (%)97.0%
Memory size932.7 KiB

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q30
95-th percentile0
Maximum72
Range72
Interquartile range (IQR)0

Descriptive statistics

Standard deviation1.497436848
Coefficient of variation (CV)10.92246977
Kurtosis767.2452097
Mean0.1370969093
Median Absolute Deviation (MAD)0
Skewness23.53979995
Sum16368
Variance2.242317113
MonotocityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
011577097.0%
 
115421.3%
 
25800.5%
 
33330.3%
 
42290.2%
 
51810.2%
 
61150.1%
 
7880.1%
 
8700.1%
 
9600.1%
 
Other values (63)4220.4%
 
ValueCountFrequency (%) 
011577097.0%
 
115421.3%
 
25800.5%
 
33330.3%
 
42290.2%
 
ValueCountFrequency (%) 
721< 0.1%
 
711< 0.1%
 
701< 0.1%
 
691< 0.1%
 
681< 0.1%
 
Distinct10
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size932.7 KiB
A
85994 
D
19201 
E
 
6535
F
 
2897
G
 
2094
Other values (5)
 
2669
ValueCountFrequency (%) 
A8599472.0%
 
D1920116.1%
 
E65355.5%
 
F28972.4%
 
G20941.8%
 
B11180.9%
 
C9320.8%
 
H6010.5%
 
P12< 0.1%
 
L6< 0.1%
 
Frequencies of value counts

Unique

Unique0 ?
Unique (%)0.0%
Histogram of lengths of the category

Length

Max length1
Median length1
Mean length1
Min length1
Distinct12
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size932.7 KiB
A
74053 
D
25322 
E
7806 
F
 
3751
G
 
2553
Other values (7)
 
5905
ValueCountFrequency (%) 
A7405362.0%
 
D2532221.2%
 
E78066.5%
 
F37513.1%
 
G25532.1%
 
C23752.0%
 
B21631.8%
 
H7120.6%
 
I3630.3%
 
K2790.2%
 
Other values (2)13< 0.1%
 
Frequencies of value counts

Unique

Unique1 ?
Unique (%)< 0.1%
Histogram of lengths of the category

Length

Max length1
Median length1
Mean length1
Min length1

booking_changes
Real number (ℝ≥0)

ZEROS

Distinct21
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.2211240472
Minimum0
Maximum21
Zeros101314
Zeros (%)84.9%
Memory size932.7 KiB

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q30
95-th percentile1
Maximum21
Range21
Interquartile range (IQR)0

Descriptive statistics

Standard deviation0.6523055727
Coefficient of variation (CV)2.949953118
Kurtosis79.39360467
Mean0.2211240472
Median Absolute Deviation (MAD)0
Skewness6.000270054
Sum26400
Variance0.4255025601
MonotocityNot monotonic
Histogram with fixed size bins (bins=21)
ValueCountFrequency (%) 
010131484.9%
 
11270110.6%
 
238053.2%
 
39270.8%
 
43760.3%
 
51180.1%
 
6630.1%
 
731< 0.1%
 
817< 0.1%
 
98< 0.1%
 
Other values (11)30< 0.1%
 
ValueCountFrequency (%) 
010131484.9%
 
11270110.6%
 
238053.2%
 
39270.8%
 
43760.3%
 
ValueCountFrequency (%) 
211< 0.1%
 
201< 0.1%
 
181< 0.1%
 
172< 0.1%
 
162< 0.1%
 

deposit_type
Categorical

Distinct3
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size932.7 KiB
No Deposit
104641 
Non Refund
14587 
Refundable
 
162
ValueCountFrequency (%) 
No Deposit10464187.6%
 
Non Refund1458712.2%
 
Refundable1620.1%
 
Frequencies of value counts

Unique

Unique0 ?
Unique (%)0.0%
Histogram of lengths of the category

Length

Max length10
Median length10
Mean length10
Min length10

agent
Real number (ℝ≥0)

MISSING

Distinct333
Distinct (%)0.3%
Missing16340
Missing (%)13.7%
Infinite0
Infinite (%)0.0%
Mean86.69338185
Minimum1
Maximum535
Zeros0
Zeros (%)0.0%
Memory size932.7 KiB

Quantile statistics

Minimum1
5-th percentile1
Q19
median14
Q3229
95-th percentile250
Maximum535
Range534
Interquartile range (IQR)220

Descriptive statistics

Standard deviation110.7745476
Coefficient of variation (CV)1.277773981
Kurtosis-0.007179564938
Mean86.69338185
Median Absolute Deviation (MAD)13
Skewness1.089385636
Sum8933753
Variance12271.00041
MonotocityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
93196126.8%
 
2401392211.7%
 
171916.0%
 
1436403.0%
 
735393.0%
 
632902.8%
 
25028702.4%
 
24117211.4%
 
2816661.4%
 
815141.3%
 
Other values (323)3173626.6%
 
(Missing)1634013.7%
 
ValueCountFrequency (%) 
171916.0%
 
21620.1%
 
313361.1%
 
447< 0.1%
 
53300.3%
 
ValueCountFrequency (%) 
5353< 0.1%
 
531680.1%
 
52735< 0.1%
 
52610< 0.1%
 
5102< 0.1%
 

company
Real number (ℝ≥0)

MISSING

Distinct352
Distinct (%)5.2%
Missing112593
Missing (%)94.3%
Infinite0
Infinite (%)0.0%
Mean189.2667353
Minimum6
Maximum543
Zeros0
Zeros (%)0.0%
Memory size932.7 KiB

Quantile statistics

Minimum6
5-th percentile40
Q162
median179
Q3270
95-th percentile435
Maximum543
Range537
Interquartile range (IQR)208

Descriptive statistics

Standard deviation131.6550146
Coefficient of variation (CV)0.6956056721
Kurtosis-0.4907952103
Mean189.2667353
Median Absolute Deviation (MAD)111
Skewness0.6015996673
Sum1286446
Variance17333.04288
MonotocityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
409270.8%
 
2237840.7%
 
672670.2%
 
452500.2%
 
1532150.2%
 
1741490.1%
 
2191410.1%
 
2811380.1%
 
1541330.1%
 
4051190.1%
 
Other values (342)36743.1%
 
(Missing)11259394.3%
 
ValueCountFrequency (%) 
61< 0.1%
 
81< 0.1%
 
937< 0.1%
 
101< 0.1%
 
111< 0.1%
 
ValueCountFrequency (%) 
5432< 0.1%
 
5411< 0.1%
 
5392< 0.1%
 
5342< 0.1%
 
5311< 0.1%
 

days_in_waiting_list
Real number (ℝ≥0)

ZEROS

Distinct128
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2.321149175
Minimum0
Maximum391
Zeros115692
Zeros (%)96.9%
Memory size932.7 KiB

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q30
95-th percentile0
Maximum391
Range391
Interquartile range (IQR)0

Descriptive statistics

Standard deviation17.59472088
Coefficient of variation (CV)7.580176694
Kurtosis186.7930696
Mean2.321149175
Median Absolute Deviation (MAD)0
Skewness11.94435345
Sum277122
Variance309.5742028
MonotocityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
011569296.9%
 
392270.2%
 
581640.1%
 
441410.1%
 
311270.1%
 
35960.1%
 
46940.1%
 
69890.1%
 
63830.1%
 
50800.1%
 
Other values (118)25972.2%
 
ValueCountFrequency (%) 
011569296.9%
 
112< 0.1%
 
25< 0.1%
 
359< 0.1%
 
425< 0.1%
 
ValueCountFrequency (%) 
39145< 0.1%
 
37915< 0.1%
 
33015< 0.1%
 
25910< 0.1%
 
23635< 0.1%
 

customer_type
Categorical

Distinct4
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size932.7 KiB
Transient
89613 
Transient-Party
25124 
Contract
 
4076
Group
 
577
ValueCountFrequency (%) 
Transient8961375.1%
 
Transient-Party2512421.0%
 
Contract40763.4%
 
Group5770.5%
 
Frequencies of value counts

Unique

Unique0 ?
Unique (%)0.0%
Histogram of lengths of the category

Length

Max length15
Median length9
Mean length10.20914649
Min length5

adr
Real number (ℝ)

ZEROS

Distinct8879
Distinct (%)7.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean101.8311215
Minimum-6.38
Maximum5400
Zeros1959
Zeros (%)1.6%
Memory size932.7 KiB

Quantile statistics

Minimum-6.38
5-th percentile38.4
Q169.29
median94.575
Q3126
95-th percentile193.5
Maximum5400
Range5406.38
Interquartile range (IQR)56.71

Descriptive statistics

Standard deviation50.53579029
Coefficient of variation (CV)0.4962705853
Kurtosis1013.189851
Mean101.8311215
Median Absolute Deviation (MAD)27.825
Skewness10.53021398
Sum12157617.6
Variance2553.8661
MonotocityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
6237543.1%
 
7527152.3%
 
9024732.1%
 
6524182.0%
 
019591.6%
 
8018891.6%
 
9516611.4%
 
12016071.3%
 
10015731.3%
 
8515381.3%
 
Other values (8869)9780381.9%
 
ValueCountFrequency (%) 
-6.381< 0.1%
 
019591.6%
 
0.261< 0.1%
 
0.51< 0.1%
 
115< 0.1%
 
ValueCountFrequency (%) 
54001< 0.1%
 
5101< 0.1%
 
5081< 0.1%
 
451.51< 0.1%
 
4501< 0.1%
 

required_car_parking_spaces
Real number (ℝ≥0)

ZEROS

Distinct5
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.06251779881
Minimum0
Maximum8
Zeros111974
Zeros (%)93.8%
Memory size932.7 KiB

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q30
95-th percentile1
Maximum8
Range8
Interquartile range (IQR)0

Descriptive statistics

Standard deviation0.2452911475
Coefficient of variation (CV)3.92354101
Kurtosis29.99805617
Mean0.06251779881
Median Absolute Deviation (MAD)0
Skewness4.163233238
Sum7464
Variance0.06016774703
MonotocityNot monotonic
Histogram with fixed size bins (bins=5)
ValueCountFrequency (%) 
011197493.8%
 
173836.2%
 
228< 0.1%
 
33< 0.1%
 
82< 0.1%
 
ValueCountFrequency (%) 
011197493.8%
 
173836.2%
 
228< 0.1%
 
33< 0.1%
 
82< 0.1%
 
ValueCountFrequency (%) 
82< 0.1%
 
33< 0.1%
 
228< 0.1%
 
173836.2%
 
011197493.8%
 

total_of_special_requests
Real number (ℝ≥0)

ZEROS

Distinct6
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.5713627607
Minimum0
Maximum5
Zeros70318
Zeros (%)58.9%
Memory size932.7 KiB

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q31
95-th percentile2
Maximum5
Range5
Interquartile range (IQR)1

Descriptive statistics

Standard deviation0.7927984228
Coefficient of variation (CV)1.387557043
Kurtosis1.492564811
Mean0.5713627607
Median Absolute Deviation (MAD)0
Skewness1.349189377
Sum68215
Variance0.6285293392
MonotocityNot monotonic
Histogram with fixed size bins (bins=6)
ValueCountFrequency (%) 
07031858.9%
 
13322627.8%
 
21296910.9%
 
324972.1%
 
43400.3%
 
540< 0.1%
 
ValueCountFrequency (%) 
07031858.9%
 
13322627.8%
 
21296910.9%
 
324972.1%
 
43400.3%
 
ValueCountFrequency (%) 
540< 0.1%
 
43400.3%
 
324972.1%
 
21296910.9%
 
13322627.8%
 
Distinct3
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size932.7 KiB
Check-Out
75166 
Canceled
43017 
No-Show
 
1207
ValueCountFrequency (%) 
Check-Out7516663.0%
 
Canceled4301736.0%
 
No-Show12071.0%
 
Frequencies of value counts

Unique

Unique0 ?
Unique (%)0.0%
Histogram of lengths of the category

Length

Max length9
Median length9
Mean length8.619473993
Min length7

reservation_status_date
Categorical

HIGH CARDINALITY

Distinct926
Distinct (%)0.8%
Missing0
Missing (%)0.0%
Memory size932.7 KiB
2015-10-21
 
1461
2015-07-06
 
805
2016-11-25
 
790
2015-01-01
 
763
2016-01-18
 
625
Other values (921)
114946 
ValueCountFrequency (%) 
2015-10-2114611.2%
 
2015-07-068050.7%
 
2016-11-257900.7%
 
2015-01-017630.6%
 
2016-01-186250.5%
 
2015-07-024690.4%
 
2016-12-074500.4%
 
2015-12-184230.4%
 
2016-02-094120.3%
 
2016-04-043820.3%
 
Other values (916)11281094.5%
 
Frequencies of value counts

Unique

Unique28 ?
Unique (%)< 0.1%
Histogram of lengths of the category

Length

Max length10
Median length10
Mean length10
Min length10

Interactions

Correlations

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Cramér's V (φc)

Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.

Missing values

Sample

First rows

hotelis_canceledlead_timearrival_date_yeararrival_date_montharrival_date_week_numberarrival_date_day_of_monthstays_in_weekend_nightsstays_in_week_nightsadultschildrenbabiesmealcountrymarket_segmentdistribution_channelis_repeated_guestprevious_cancellationsprevious_bookings_not_canceledreserved_room_typeassigned_room_typebooking_changesdeposit_typeagentcompanydays_in_waiting_listcustomer_typeadrrequired_car_parking_spacestotal_of_special_requestsreservation_statusreservation_status_date
0Resort Hotel03422015July2710020.00BBPRTDirectDirect000CC3No DepositNaNNaN0Transient0.000Check-Out2015-07-01
1Resort Hotel07372015July2710020.00BBPRTDirectDirect000CC4No DepositNaNNaN0Transient0.000Check-Out2015-07-01
2Resort Hotel072015July2710110.00BBGBRDirectDirect000AC0No DepositNaNNaN0Transient75.000Check-Out2015-07-02
3Resort Hotel0132015July2710110.00BBGBRCorporateCorporate000AA0No Deposit304.0NaN0Transient75.000Check-Out2015-07-02
4Resort Hotel0142015July2710220.00BBGBROnline TATA/TO000AA0No Deposit240.0NaN0Transient98.001Check-Out2015-07-03
5Resort Hotel0142015July2710220.00BBGBROnline TATA/TO000AA0No Deposit240.0NaN0Transient98.001Check-Out2015-07-03
6Resort Hotel002015July2710220.00BBPRTDirectDirect000CC0No DepositNaNNaN0Transient107.000Check-Out2015-07-03
7Resort Hotel092015July2710220.00FBPRTDirectDirect000CC0No Deposit303.0NaN0Transient103.001Check-Out2015-07-03
8Resort Hotel1852015July2710320.00BBPRTOnline TATA/TO000AA0No Deposit240.0NaN0Transient82.001Canceled2015-05-06
9Resort Hotel1752015July2710320.00HBPRTOffline TA/TOTA/TO000DD0No Deposit15.0NaN0Transient105.500Canceled2015-04-22

Last rows

hotelis_canceledlead_timearrival_date_yeararrival_date_montharrival_date_week_numberarrival_date_day_of_monthstays_in_weekend_nightsstays_in_week_nightsadultschildrenbabiesmealcountrymarket_segmentdistribution_channelis_repeated_guestprevious_cancellationsprevious_bookings_not_canceledreserved_room_typeassigned_room_typebooking_changesdeposit_typeagentcompanydays_in_waiting_listcustomer_typeadrrequired_car_parking_spacestotal_of_special_requestsreservation_statusreservation_status_date
119380City Hotel0442017August35311320.00SCDEUOnline TATA/TO000AA0No Deposit9.0NaN0Transient140.7501Check-Out2017-09-04
119381City Hotel01882017August35312320.00BBDEUDirectDirect000AA0No Deposit14.0NaN0Transient99.0000Check-Out2017-09-05
119382City Hotel01352017August35302430.00BBJPNOnline TATA/TO000GG0No Deposit7.0NaN0Transient209.0000Check-Out2017-09-05
119383City Hotel01642017August35312420.00BBDEUOffline TA/TOTA/TO000AA0No Deposit42.0NaN0Transient87.6000Check-Out2017-09-06
119384City Hotel0212017August35302520.00BBBELOffline TA/TOTA/TO000AA0No Deposit394.0NaN0Transient96.1402Check-Out2017-09-06
119385City Hotel0232017August35302520.00BBBELOffline TA/TOTA/TO000AA0No Deposit394.0NaN0Transient96.1400Check-Out2017-09-06
119386City Hotel01022017August35312530.00BBFRAOnline TATA/TO000EE0No Deposit9.0NaN0Transient225.4302Check-Out2017-09-07
119387City Hotel0342017August35312520.00BBDEUOnline TATA/TO000DD0No Deposit9.0NaN0Transient157.7104Check-Out2017-09-07
119388City Hotel01092017August35312520.00BBGBROnline TATA/TO000AA0No Deposit89.0NaN0Transient104.4000Check-Out2017-09-07
119389City Hotel02052017August35292720.00HBDEUOnline TATA/TO000AA0No Deposit9.0NaN0Transient151.2002Check-Out2017-09-07

Duplicate rows

Most frequent

hotelis_canceledlead_timearrival_date_yeararrival_date_montharrival_date_week_numberarrival_date_day_of_monthstays_in_weekend_nightsstays_in_week_nightsadultschildrenbabiesmealcountrymarket_segmentdistribution_channelis_repeated_guestprevious_cancellationsprevious_bookings_not_canceledreserved_room_typeassigned_room_typebooking_changesdeposit_typeagentcompanydays_in_waiting_listcustomer_typeadrrequired_car_parking_spacestotal_of_special_requestsreservation_statusreservation_status_datecount
1City Hotel02562016October43162320.00BBDEUOnline TATA/TO000AA0No Deposit9.0333.00Transient-Party100.7500Check-Out2016-10-217
4Resort Hotel0242015November45331010.00BBFRACorporateCorporate000AA2No Deposit334.0281.00Transient-Party40.0000Check-Out2015-11-165
14Resort Hotel0362015November4572610.00BBDEUCorporateCorporate000AA1No Deposit185.0281.00Transient-Party36.0000Check-Out2015-11-154
13Resort Hotel0362015November4572610.00BBAUTCorporateCorporate000AA1No Deposit185.0281.00Transient-Party36.0000Check-Out2015-11-153
0City Hotel002015August33130220.00BBPRTOnline TATA/TO000AB0No Deposit9.09.00Transient85.0000Check-Out2015-08-152
2City Hotel02562016October43162320.00BBDEUOnline TATA/TO000AA1No Deposit9.0333.00Transient-Party100.7500Check-Out2016-10-212
3Resort Hotel052017January121310.00BBPRTOnline TATA/TO000AA0No Deposit314.029.00Transient-Party40.4011Check-Out2017-01-062
5Resort Hotel0242015November45331020.00BBITACorporateCorporate000AA1No Deposit326.0281.00Transient48.0000Check-Out2015-11-162
6Resort Hotel0242015October442671510.00BBAUTCorporateCorporate000EG2No Deposit185.0281.00Transient-Party52.2000Check-Out2015-11-172
7Resort Hotel0272015November4562710.00BBFRACorporateCorporate000AA1No Deposit334.0281.00Transient-Party40.0000Check-Out2015-11-152